168 research outputs found
Dissimilarity-based representation for radiomics applications
Radiomics is a term which refers to the analysis of the large amount of
quantitative tumor features extracted from medical images to find useful
predictive, diagnostic or prognostic information. Many recent studies have
proved that radiomics can offer a lot of useful information that physicians
cannot extract from the medical images and can be associated with other
information like gene or protein data. However, most of the classification
studies in radiomics report the use of feature selection methods without
identifying the machine learning challenges behind radiomics. In this paper, we
first show that the radiomics problem should be viewed as an high dimensional,
low sample size, multi view learning problem, then we compare different
solutions proposed in multi view learning for classifying radiomics data. Our
experiments, conducted on several real world multi view datasets, show that the
intermediate integration methods work significantly better than filter and
embedded feature selection methods commonly used in radiomics.Comment: conference, 6 pages, 2 figure
Handwritten Document Analysis for Automatic Writer Recognition
In this paper, we show that both the writer identification and the writer verification tasks can be carried out using local features such as graphemes extracted from the segmentation of cursive handwriting. We thus enlarge the scope of the possible use of these two tasks which have been, up to now, mainly evaluated on script handwritings. A textual based Information Retrieval model is used for the writer identification stage. This allows the use of a particular feature space based on feature frequencies. Image queries are handwritten documents projected in this feature space. The approach achieves 95% correct identification on the PSI_DataBase and 86% on the IAM_DataBase. Then writer hypothesis retrieved are analysed during a verification phase. We call upon a mutual information criterion to verify that two documents may have been produced by the same writer or not. Hypothesis testing is used for this purpose. The proposed method is first scaled on the PSI_DataBase then evaluated on the IAM_DataBase. On both databases, similar performance of nearly 96% correct verification is reported, thus making the approach general and very promising for large scale applications in the domain of handwritten document querying and writer verification
Dynamic voting in multi-view learning for radiomics applications
Cancer diagnosis and treatment often require a personalized analysis for each
patient nowadays, due to the heterogeneity among the different types of tumor
and among patients. Radiomics is a recent medical imaging field that has shown
during the past few years to be promising for achieving this personalization.
However, a recent study shows that most of the state-of-the-art works in
Radiomics fail to identify this problem as a multi-view learning task and that
multi-view learning techniques are generally more efficient. In this work, we
propose to further investigate the potential of one family of multi-view
learning methods based on Multiple Classifiers Systems where one classifier is
learnt on each view and all classifiers are combined afterwards. In particular,
we propose a random forest based dynamic weighted voting scheme, which
personalizes the combination of views for each new patient for classification
tasks. The proposed method is validated on several real-world Radiomics
problems.Comment: 10 page
Adaptation de modèles de Markov cachés - Application à la reconnaissance de caractères imprimés
International audienceWe present in this paper a new algorithm for the adaptation of hidden Markov models (HMM models). The principle of our iterative adaptive algorithm is to alternate an HMM structure adaptation stage with an HMM Gaussian MAP adaptation stage. This algorithm is applied to the recognition of printed characters to adapt the models learned by a polyfont character recognition engine to new forms of characters. Comparing the results with those of MAP and MLLR classic adaptations shows a slight increase in the performance of the recognition system
Détection de motifs graphiques dans des images de documents anciens
International audienceLa détection de motifs graphiques consiste à rechercher dans une collection d'images de documents, les occurences les plus similaires à une requête image. Dans cet article, nous proposons un système non supervisé pour la détection de motifs, sans besoin de segmentation préalable, en nous inspirant de techniques récentes en vision par ordinateur. Notre approche s'appuie sur une décomposition des documents en fenêtres de tailles variées et une description de ces fenêtres par sac de mots visuels, le tout hors-ligne afin de diminuer le temps de calcul. Une technique de compression des données, proposée tout récemment en recherche d'images, permet de maintenir une quantité de mémoire raisonnable, mais nécessite d'approximer le calcul de distance à la requête. De premiers résultats encourageants sont obtenus sur la base de documents DocExplore, une base de documents médiévaux. Abstract-Pattern spotting consists of retrieving the most similar graphical patterns from a collection of document images. Inspired by the recent advances in computer vision and word spotting techniques, we propose in this paper an unsupervised, segmentation-free pattern spotting system. Overall, the system includes a powerful patch-based framework, the bag of visual word model with an offline sliding window mechanism to avoid heavy computational burden during the retrieval process. Our system takes advantage of the most recent powerful compression and distance approximation techniques (product quantization and asymmetric distance computation) to efficiently index the great number of sub-windows produced by sliding windows and allows to retrieve small sized queries in a large indexed corpus
Alpha-Numerical Sequences Extraction in Handwritten Documents
International audienceIn this paper, we introduce an alpha-numerical sequences extraction system (keywords, numerical fields or alpha-numerical sequences) in unconstrained handwritten documents. Contrary to most of the approaches presented in the literature, our system relies on a global handwriting line model describing two kinds of information : i) the relevant information and ii) the irrelevant information represented by a shallow parsing model. The shallow parsing of isolated text lines allows quick information extraction in any document while rejecting at the same time irrelevant information. Results on a public french incoming mails database show the efficiency of the approach
A syntax directed method for numerical field extraction in incoming mail documents
In this article, we propose a generic method for the automatic localisation and recognition of numerical fields (phone
number, ZIP code, etc.) in unconstrained handwritten incoming mail documents. The method exploits the syntax of a
numerical field as an a priori knowledge to locate it in the document. A syntactical analysis based on Markov models
filters the connected component sequences that respect a particular syntax known by the system. Once extracted, the
fields are submitted to a numeral recognition process. Hence, we avoids an integral recognition of the document,
which is a very tough and time consuming task. We show the efficiency of the method on a real incoming mail
document database.Dans cet article, nous présentons une méthode générique d'extraction et de reconnaissance de champs
numériques (numéro de téléphone, code postal, etc.) dans des courriers manuscrits non contraints. La
méthode d'extraction exploite la syntaxe des champs comme information a priori pour les localiser. Un
analyseur syntaxique à base de modèles de Markov filtre les séquences de composantes qui respectent la
syntaxe d'un type de champ connu du système. Notre approche permet ainsi d'éviter la reconnaissance
totale du document, opération délicate et coûteuse en temps de calcul, puisque seuls les champs localisés
sont soumis à un système de reconnaissance. Nous montrons l'efficacité de la méthode sur une base de
courriers manuscrits réels de type courrier entrant
Pattern Spotting and Image Retrieval in Historical Documents using Deep Hashing
This paper presents a deep learning approach for image retrieval and pattern
spotting in digital collections of historical documents. First, a region
proposal algorithm detects object candidates in the document page images. Next,
deep learning models are used for feature extraction, considering two distinct
variants, which provide either real-valued or binary code representations.
Finally, candidate images are ranked by computing the feature similarity with a
given input query. A robust experimental protocol evaluates the proposed
approach considering each representation scheme (real-valued and binary code)
on the DocExplore image database. The experimental results show that the
proposed deep models compare favorably to the state-of-the-art image retrieval
approaches for images of historical documents, outperforming other deep models
by 2.56 percentage points using the same techniques for pattern spotting.
Besides, the proposed approach also reduces the search time by up to 200x and
the storage cost up to 6,000x when compared to related works based on
real-valued representations.Comment: 7 page
- …